Can ChatGPT Evaluate Plans?

Xinyu Fu, Ruoniu Wang & Chaosu Li (2023). Can ChatGPT Evaluate Plans?, Journal of the American Planning Association, DOI: 10.1080/01944363.2023.2271893

View Publication

Abstract

Problem, research strategy, and findings
Large language models, such as ChatGPT, have recently risen to prominence in producing human-like conversation and assisting with various tasks, particularly for analyzing high-dimensional textual materials. Because planning researchers and practitioners often need to evaluate planning documents that are long and complex, a first-ever possible question has emerged: Can ChatGPT evaluate plans? In this study we addressed this question by leveraging ChatGPT to evaluate the quality of plans and compare the results with those conducted by human coders. Through the evaluation of 10 climate change plans, we discovered that ChatGPT’s evaluation results coincided reasonably well (with an average of 68%) with those from the traditional content analysis approach. We further scrutinized the differences by conducting a more in-depth analysis of the results from ChatGPT and manual evaluation to uncover what might have contributed to the variance in results. Our findings indicate that ChatGPT struggled to comprehend planning-specific jargon, yet it could reduce human errors by capturing details in complex planning documents. Finally, we provide insights into leveraging this cutting-edge technology in future planning research and practice.
Takeaway for practice
ChatGPT cannot be used to replace humans in plan quality evaluation yet. However, it is an effective tool to complement human coders to minimize human errors by identifying discrepancies and fact-checking machine-generated responses. ChatGPT generally cannot understand planning jargon, so planners wanting to use this tool should use extra caution when planning terminologies are present in their prompts. Creating effective prompts for ChatGPT is an iterative process that requires specific instructions.

Keywords

ChatGPT; large language model; natural language processing; plan evaluation; plan quality